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THE  LEAST-SQUARES  ESTIMATION  OF  LATENT  TRAIT  VARIABLES 
BY  A HILBERT  SPACE  APPROACH 


Kikumi  Tatsuoka 

University  of  Illinois  at  Urbana-Champaign 
ABSTRACT 

This  research  developed  a new  method  for  estimating  a given  latent 
trait  variable  6 by  the  least-squares  approach.  The  notion  of  multiple  re- 
gression equation  was  reinterpreted  in  terms  of  the  properties  of  a Hilbert 
space  and  the  calculation  formula  for  3 weights  that  can  be  obtained  re- 
cursively in  the  form  of  Fourier  series  was  derived.  The  0 values  estimated 
by  this  method  and  the  maximum  likelihood  method  were  compared  using  live 

A 

data.  It  was  shown  that  9 values  estimated  by  the  least-squares  method  was 

A 

just  as  good  as  0 by  the  maximum  likelihood  method.  The  advantages  of  using 

A 

this  method  as  against  the  traditional  method  is  that  values  of  9 are  always 
obtainable  even  for  a small  number  of  items.  The  maximum  likelihood  method, 
on  the  other  nand,  often  fails  to  converge  in  such  cases. 


INTRODUCTION 


Latent  trait  theory  has  provided  the  only  models  that  are  appli- 
cable to  assessing  an  individual  student's  ability  level  in  an  adaptive 
testing  situation  where  each  student  takes  a different  set  of  items.  The 
estimation  of  parameters  of  the  latent  trait  theory  models  has  been 
achieved  iteratively  by  the  maximum  likelihood  (MXL)  method.  This  method 
provides  considerably  accurate  values  for  data  generated  by  the  Monte  Carlo 
method,  but  it  often  fails  to  converge  and  cannot  estimate  the  parameters 
for  live  data  in  which  even  only  a very  few  instances  of  certain  kinds  of 
erroneous  data  were  unfortunately  included;  it  also  fails  for  either  very 
high  or  very  low  0 values.  This  sensitivity  leads  us  to  an  inconvenient 
position  when  adaptive  testing  will  be  used  in  practice,  either  as  a diag- 
nostic test  or  a posttest  at  the  end  of  instruction.  When  diagnostic  adap- 
tive testing  is  used  for  routing  each  student  to  his/her  most  appropriate 
instructional  unit  in  a series  of  lessons  written  on  computer-based  educa- 
tion systems,  the  problem  of  having  nonconvergent  9 values  causes  disasters. 

Several  diagnostic  adaptive  tests  (e.g.,  operations  with  signed 
numbers  and  matrix  algebra  lessons)  were  implemented  on  the  PLATO  system 
and  they  have  been  administered  at  a junior  high  school  and  the  University 
of  Illinois  this  fall  semester.  The  results  from  these  tests  were  used  for 
routing  students  to  their  levels  in  the  instructional  materials  prepared 
on  the  system.  The  maximum  likelihood  method  failed  to  provide  estimated  0 
values  for  about  10  percent  of  the  students.  A part  of  the  blame  might  be 
attributed  to  the  fact  that  the  U of  I system  of  Computer  Based  Education 
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(CBE)  is  more  widely  used  for  those  who  are  either  very  advanced  or  who 
are  slow  learners  than  for  average  students  at  the  public  school  systems 
in  Champaign  and  Urbana.  And  also  the  number  of  subjects  that  was  used 
to  calibrate  item  discriminating  powers  and  difficulties  was  far  less  than 
the  2000  which  has  been  customarily  required.  It  is,  indeed,  impossible 
to  collect  as  many  as  2000  subjects'  data  with  our  CBE  system,  especially 
in  order  to  develop  diagnostic  or  criterion  referenced  adaptive  testings 
as  measures  of  achievement  either  before  or  after  a series  of  instructions— 
even  though  it  is  well  known  that  the  PLATO  system  at  the  U of  I has  the 
largest  usage  in  the  country. 

The  purpose  of  this  study  is  to  propose  a new  model  that  will 
take  advantage  of  the  powerful  properties  of  a Hilbert  space,  where  the 
estimated  0 values  will  be  always  obtained.  The  coefficients,  i.e.,  B 
weights  in  the  multiple  regression  equation--which,  in  the  Hilbert  space 
approach,  are  obtained  by  projecting  0 onto  the  space  spanned  by  given  pre- 
dictor vectors— are  used  to  estimate  0 with  the  help  of  Fourier  series.  The 
results  are  compared  with  those  obtained  by  the  MXL  estimation  method. 


The  accuracy  of  the  newly  presented  least-squares  (LS)  estimation 
method  of  0 is  discussed  in  the  following  sections.  It  can  be  shown  that 
the  properties  of  multiple  R and  standard  error  of  estimate  developed  in 
terms  of  norms  and  Fourier  coefficients  exactly  parallel  those  from  the 
theory  of  multiple  regression. 

MULTIPLE  REGRESSION  FUNCTIONS  AND  PROJECTION  OPERATORS 


Hilbert  space:  Suppose  Xg  is  the  raw- score  vector  of  a given  item  g in  a 


population,  or  more  precisely  a space  spanned  by  the  persons  in  the  popu- 
lation. This  space  can  be  a complete  latent  space  of  the  uni  dimensional 
latent  trait  0 for  a given  set  of  n items.  In  latent  trait  theory,  a re- 
sponse pattern  for  person  i is  crucial  for  determining  the  likelihood 
function,  but  the  focus  of  our  interest  in  this  paper  will  be  switched  to 
the  score  vector  of  item  g in  the  person-space  where  the  two  parameter 
latent  trait  model  is  considered.  Let  us  consider  a deviation  score  vec- 
t°r  xg  = (Xgl,  Xg2,  . . . XgN)  -(p  , yg,  . . .,  ug)  for  convenience  in  the 
further  work.  A set  of  deviation  score  vectors  {x..}  will  be  called  a Hilbert 
space  if  an  appropriate  metric,  or  norm  is  defined. 

If  the  norm  of  xg  is  defined  by 

/j,Vxgi  ■ (Vig)1/2’ 

then  the  standard  deviation  of  item  g becomes  to  the  norm  of  xg  divided  by 
vfl-1,  and  the  covariance  of  two  vectors  xg  and  xf  is  the  inner  product  of 
these  two  vectors  divided  by  N-l.  Two  vectors  xg  and  xf  are  said  to  be 
orthogonal  if  their  inner  product  equals  zero,  that  is  (xg,x^)  =0.  A 
vector  xg  is  said  to  be  orthogonal  to  a set  M if  xg  is  orthogonal  to  every 
vector  in  M.  Hilbert  space  with  these  properties  which  are  direct  conse- 
quences from  the  definition  of  the  inner  product  of  any  two  vectors  provides 
a wealth  of  structural  characteristics  that  implies  very  useful  analytical 
results  applicable  to  many  problems.  For  example,  the  shortest  vector  from 
a point  to  a subspace  M in  n-dimensional  Euclidean  space  is  orthogonal  to 
the  subspace  M.  This  is  a special  case  of  the  optimization  principle  which 
is  called  the  projection  theorem.  Tatsuoka  (1975)  discussed  in  detail 
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about  the  relationship  among  projection  operators,  regression  functions 
and  multiple  regression  functions  in  her  paper,  so  only  a brief  explanation 
will  be  given  here. 


The  projection  theorem  can  be  explained  briefly  as  follows:  For 
a given  vector  x and  a subspace  M,  there  exists  a unique  vector  m in  M 
which  is  closest  to  x in  the  sense  that  it  minimizes  ||x  - m||  . Furthermore 
a necessary  and  sufficient  condition  that  m be  the  unique  minimizing  vec- 
tor is  that  x - m be  orthogonal  to  M.  Hence,  m can  be  called  the  multiple 
regression  function  R(x  J M)  and  the  residual  will  be  x - m which  is  ortho- 
gonal to  m = R(x  | M).  If  M is  of  a single  vector  space  then  R(x  | M)  is 
the  regression  function  of  x onto  M.  It  is  always  true  that  R(x  | M)  is 
linear  and  idempotent;  that  is  for  any  real  variables  s and  t, 

R(sxg  + txf  | M)  = sR(xg  | M)  + tR(xf  | M) 

R(R(xg  | M)  | M)  = R(xg  | M). 

In  order  to  express  R(xg  | M)  in  the  practical  form  of  a linear 
function,  we  will  need  to  introduce  more  properties  briefly  in  Hilbert 
space.  The  notion  of  orthonormal  basis,  Fourier  series  will  be  introduced 
in  the  following. 

An  orthonormal  set  of  vectors:  A set  of  vectors  e.,  i=  1,  . . .n  (denoted 
by  {e.j}  from  now  on)  is  said  to  be  orthogonal  if  each  vector  in  the  set  is 
orthogonal  and  has  norm  equal  to  unity.  An  orthonormal  set  of  nonzero  vec- 
tors are  mutually  independent*,  in  other  words,  the  inner  product  of  any  two 
vectors  in  the  set  equals  zero.  In  Hilbert  space,  such  a set  always  exists 


and  can  be  easily  constructed  by  the  Gram- Schmidt  orthogonal ization  proce- 
dure. It  is  clear  that  the  e^s  in  the  set  of  orthogonal  vectors  generate 
the  same  space  as  that  generated  by  the  x.s. 

Normal  equations  in  the  context  with  Hilbert  space:  Our  concern  in  the 
study  is  to  express  the  multiple  regression  of  0 onto  set  {x^}  as  a linear 

A 

combination  of  x^s,  8 = d^x^  + d2x2  + . . . + dnxn  which  is  equivalent  to 
that  of  finding  the  n scalars  d. , i = 1 , . . . , n,  upon  minimizing  expres- 
sion (1), 

2 “ dl?l  " d2?2  " • • • - dn?n 
or,  setting  the  inner  product  equal  to  zero. 


^ ^ " dl?l  " d2~2  • • * " dnV  = 0 

for  each  i.  By  rewriting  Equation  (2),  we  obtain  Equation  (3). 

^1  ( ^ i * ^ i ) * ^2  ^ x2  *X1  ^ dn^xn’xl ^ = (®’X-|) 

d!(xrx2)  + d2 ^2 ’~2 ^ + • • • + dn(xn>x2^  = (e*x2) 


dl  + d2^2’?n^  + * 


+ dn(V?n>  “ <£'5n) 


This  set  of  n equations  with  n coefficients  d^s  is  known  as  the  normal  equa- 
tions for  the  least-squares  problems  if  8 is  replaced  by  any  criterion  vec- 
tor. If  the  n x n matrix  of  (x.,Xj),  i,j  = 1,  . . .n  has  a nonzero  deter- 
minant, then  Equation  (3)  can  be  solved  for  the  coefficients  d^  Since 


i 
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| ! 


H 


(0,x.)  = (N-l  )pQ.a(0)a(x^ ) , the  right-hand  sides  of  Equations  (3)  will  be 
replaced  by  the  known  values  a^  (N-l  )a(x.. )//  ( 1 +a ^ ) (Lord  and  Novick,  1968) 
if  we  assume  0 ~ N(0,1),  where  a..  is  the  item  discriminating  power  of 
i tern  i . 

The  evaluation  of  an  n x n determinant  is  not  an  easy  task,  es- 
pecially if  n is  large.  As  an  alternative  approach,  we  will  introduce  an 
analytical  method  using  Fourier  series  to  estimate  0. 

Fourier  series  and  Gram- Schmidt  procedure:  Let  us  apply  the  Gram-Schmidt 
procedure  to  a set  of  {x-j.x^,  . . xn>  (or  {x.}),  obtaining  an  orthonormal 
set  {e^e^,.  . . ,en>.  Then  any  vector  y in  the  space  is  approximated  in 
the  form  of  a Fourier  series  (Simmons,  1963;  Dunford  and  Schwartz,  1964; 
Tatsuoka,  1975)  as  follows, 

/v  n 

(4)  y=  I (y,e.)e. 

~ i=l  ~ -1  -1 

A 

Since  y - y is  orthogonal  to  the  space  spanned  by  {e^,  which  is  equivalent 

a 

to  saying  that  y - y is  orthogonal  to  the  space  spanned  by  {x^},  the  esti- 
mated y will  be  easily  obtained  once  the  independent  vectors  x^s  are  ortho- 
gonalized.  Equation  (4)  relates  closely  with  the  Gram-Schmidt  procedure 

A 

because  the  numerator  of  Equation  (5)  below  is  the  residual  vector  x^  - x^, 

A 

where  x^  is  the  regression  of  x^  onto  the  space  ^ spanned  by  (x^  -j}  or 

A 

equivalently  by  {e)<,_ i > and  moreover  x^  - x^  is  orthogonal  to  space 
xi  ~ R(x.j  | e.j ,e£, . . • * ® ^ _ i ) 

^ " R^i  I !l»®2»*  ' *'®i-l^l 

?i  " R^i  I ®l*®2**  ‘ *’?i-l ^ 

> t Say. 

Ki 
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e.j,  i = 1,.  . . ,n  are  normalized  residual  vectors  while  are  not. 

If  we  consider  the  space  spanned  by  {x^,.  . . ,xn,0},  adding  0 to  {x^}  then 
the  last  orthogonal i zed  vector  en+^  by  Gram-Schmidt  procedure  has  the 
numerator  of  0 - 0,  where  the  estimated  0 is  given  by  Equation  (6)  below, 
replacing  vector  y in  Equation  (4)  by  0. 

(6)  9=1  (9,e,)e 

~ i=l  ~ -1 

Also,  the  Gram-Schmidt  process  can  be  said  to  be  a procedure  f c r inverting 
the  matrix  given  by  Equation  (3)  and  can  itself  be  viewed  as  the  solution 
of  a minimum  norm  approximation  or  the  least-squares  problem. 

ACCURACY  OF  THE  ESTIMATION 

Standard  error  of  prediction:  The  variance  error  of  prediction  is  1 / (N- 1 ) 

A 

times  the  squared  norm  of  the  residual  vector  0-0  and  also  is  the  dif- 

/v 

ference  of  the  squared  norms  of  0 and  0. 

(7)  l (0.  - 0.)2  = (0  - 0,  0 - 0) 

i=l  -1  ~1  ~ ~ ~ ~ 

= (9,0)  - (0,0) 

- H?H2  - ll?||2 

where  N is  the  number  of  subjects.  The  standard  error  of  prediction  is  the 
square  root  of  this. 

A 

Equation  (7)  describes  a relation  between  the  estimated  0 and  0. 

A 

If  the  norm  of  both  0 and  0 are  the  same,  then  standard  err^r  of  prediction 
will  be  zero,  and  also  the  latent  trait  0 always  has  a larger  norm  than  the 

A A 

estimated  0,  ||0||  £ || 0 1| , since  the  left  hand  side  of  Equation  (7)  is  always 
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nonnegative.  In  other  words,  6 has  larger  standard  deviation  than  6 has. 


Multiple  R:  The  correlation  of  the  estimated  0 and  0 itself  indicates  to 

A 

what  extent  0 represents  the  unknown  vector  0.  In  order  to  show  the  rela- 
tionship between  these  two  vectors,  some  further  properties  are  needed. 


The  inner  product  of  0 and  0 is  the  sum  of  squared  Fourier  coef- 


ficients: 


(8)  (0,0)  = l (0,e. r 

i=!  ~ 

A simple  calculation  leads  us  to  the  relation  that  the  squared  norm  of  0 is 
also  the  sum  of  squared  Fourier  coefficients,  given  as  follows: 

(9)  ||  0 ||2  = ( I (9»e.  )e  • , l (0,e.)e.)  = l (0,e.)2 

i=l  ~ -1  -1  i=l  ~ i=l  ~ 

since  (e.,e.)  = 0 for  i f j,  (e.,e.)  = 1 for  i = j.  Therefore  Equation  (10) 

~ j ~ J 

A 

below,  the  relation  that  the  norm  of  0 equals  the  inner  product  of  0 and  0, 


is  true. 


0.0)  = II  9 


Let  us  denote  the  covariance  of  0 and  0 by  cov(0,0),  which  is  the  inner  prod- 

A /V 

uct  of  0 and  0 divided  by  N-l,  and  the  standard  deviation  of  0 by  a(0). 


(11)  cov(0,0)  = l 1) 

■M  -1 

(12)  a2(0)  = ||  0 || /(N-l) 


Hence, 


8 


(13) 


r 


- 


{ 


cov(e,e)  = a (e) 


The  squared  correlation  between  6 and  e is  given  by 

n 


(14) 


R2  = corr2(e,0)  = — — l (6,e. )' 

II  e II2  i=i  ~ -1 


Meanwhile,  the  squared  correlation  between  0 and  e^  is  given  by 


R?  = corr2(e,e.)  = — — (0,e.)2 

1 ~ -1  II  rv  112  ~ -1 


0 


Therefore,  the  squared  multiple  R of  the  multiple  regression  0 equals  to 


the  sum  of  the  squared  correlation  between  0 and  e.,  thus  Equation  (15) 


holds: 

(15)  R2  = R?  + ‘ * + Rn 

Let  us  change  the  notation  of  the  component  R.  in  Equation  (15),  to  R. 


0ei 


so  that  the  distinction  from  R~e  , the  correlation  between  0 and  ei , will 


be  more  noticeable.  It  can  be  shown  that  R~e  is  given  by  the  following 


formula. 


(e,e.)‘ 

? = — ~-.rJ 

96  j It^.e)2 


<£! I 


The  numerators  of  R~  and  RC  are  same  but  the  denominators  are  the  norms 
0ej  0ei 


of  0 and  0 respectively.  It  should  be  noted  that  the  correlation  of  0 and 


ei’  R0e  is  sma^er  or  ebual  to  the  correlation  of  0 and  e^,  R~e  because 

~ J j 

||  6 ||  5 ||  0 || . That  is,  R~e  > R0e  , the  correlation  of  estimated  0 with 

J J 


each  e.  is  always  larger  than  or  equal  to  the  correlation  of  Q itself  and 


each  e^..  r|c  will  be  equal  to  R2g  if  and  only  if  ||  0 ||  = ||  0 || . 
J J 


\ 1 


1 


Complete  Hilbert  space:  If  {e..}  is  an  orthonormal  set  in  a Hilbert  space, 
then  Holder's  inequality.  Equation  (16)  will  be  always  true. 

(16)  I <e,e,)2  ; n e II2 
i=l  ~ -1 

The  proof  of  this  theorem  will  be  given  first  by  taking  the  squared 

A O 

norm  of  0 - £ (0,e^)e^,  that  is  ||  0 - 0 ||  , we  obtain 

lie  - I (e,e  ,)e.  ||2  - ||  e ||2  - ||  e ||2 
" i=l  ~ ^ 

Since  the  left-hand  side  of  the  abovA  equation  is  always  positive,  so  inequal- 
ity (16)  is  always  true  for  {e..}  and  0,  but  the  result  can  be  generalized 
to  any  vector  y and  orthonormal  set. 

A 

It  is  interesting  to  note  that  the  norm  of  an  estimated  0 value  is 
always  smaller  than  or  equal  to  the  norm  of  0 itself.  The  degree  of  accuracy 
in  estimating  0 can  be  measured  by  the  difference  of  the  norms.  Geometric- 
ally, ||  0 ||  means  that  the  endpoint  of  0 is  on  a circle  of  radius  ||  0 ||  with 
the  origin  as  its  center.  Hence  the  magnitude  of  ||  0 ||  -||  0 ||  is  the 

A 

extent  which  the  circle  of  radius  ||  0 ||  is  close  to  the  circle  of  radius 
||  0 || . Since  the  standard  error  of  prediction  is  given  by  v'lle  ||2  - H0H2, 
it  can  be  restated  geometrically  that  the  difference  of  the  areas  of  the  two 
circles  with  radii  ||  0 ||  and  ||  0 ||  is  proportional  to  the  squared  standard 
error  of  prediction. 

2 

We  know  that  the  squared  multiple  R of  0 an  ’ e . , R.  is  generally 

~i  oe. 

/v  O I 

smaller  than  the  squared  multiple  R of  0 with  e.,  R^  . This  relation  im- 

•v  ~ i ye. 

A ! 

plies  that  the  correlation  of  0 with  each  ei  tends  to  provide  a bit  inflated 
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! 


R value  compared  with  the  correlation  of  the  latent  trait  variable  0 itself 

A 

and  with  e^ . The  squared  correlation  of  9 and  0 is  expressed  as  the  sum 
of  the  squared  correlation  of  0 with  each  element  of  (e^  according  to 

Equation  (15).  If  we  replace  each  term  of  Equation  (15)  by  the  estimated 

2 2 

squared  correlation,  R~  , then  the  newly  obtained  estimated  R ~ will  be 

U0.j  uu 

larger  than  the  original  squared  multiple  R.  This  fact  warns  us  that  the 

2 ~ 

estimated  R is  inflated  if  we  use  the  estimated  0 instead  of  0 itself 

in  the  practice. 

The  concept  of  the  complete  latent  trait  space  for  n items  in  the 
theory  of  latent  trait  models  implies  that  the  parameters  obtained  from  the 
models  are  population  free  and  should  not  be  subject  to  either  sampling 
errors  or  errors  of  measurement  in  the  Classical  Test  Theory  sense.  Lord 
and  Novick  (1969)  showed  derivations  of  the  equivalent  properties  of  the 
assunption  of  local  independence  in  Chapter  16  (pp.  358-362)  in  their  book. 

I think  the  argument  in  this  chapter  is  somewhat  misleading  because  they 
introduced  the  concept  of  the  complete  latent  trait  space  with  the  context 
of  multidimensional  complete  latent  trait  space  and,  at  the  same  time,  dis- 
cussed about  the  assumption  of  local  independence  there.  The  local  indepen- 
dence is  valid  only  in  the  unidimensional  complete  latent  trait  space  so  far 
as  the  discussion  in  the  book  is  concerned,  and  not  in  multidimensional 
space.  Moreover,  the  subsequent  discussions  and  theories  in  Chapters  17 
through  19  are  really  only  for  a uni  dimensional  space. 

If  we  accept  the  existence  of  such  a unidimensional  complete  latent 
trait  space  for  n items  and  the  assumption  of  local  independence,  then  we  are 
supposed  to  be  able  to  obtain  the  sample-free  and  population-free  parameters 


11 


a's  and  b's  theoretically.  One  might  wonder  whether  or  not  this  nice  fea- 
ture can  be  extended  to  our  model. 

Let  us  restate  the  concept  of  a unidimensional  complete  latent 
trait  space  in  terms  of  vector  analytical  context.  If  the  given  n items 
are  enough  to  measure  the  latent  trait  0 that  will  affect  the  performance 
on  these  items  in  some  population,  then  the  vector  9 lies  in  the  space 
spanned  by  n items.  Therefore  6 can  be  expressed  as  a linear  combination 
of  these  n vectors,  {x..}.  An  orthonormal  set  {e.}in  Hilbert  space  is  said 
to  be  complete  if  it  is  maximal,  in  other  words  if  it  is  impossible  to  ad- 
join a new  vector  to  the  set  (e^ } in  such  a way  that  {e.,en+^}  is  an  ortho- 
normal set  which  contain  {e^}  as  a subset.  Then  such  a set  {e^}  is  said  to 
be  complete  and  if  x is  an  arbitrary  vector  in  the  space,  then 

(17)  x = yx.e.Je. 
and  the  squared  norm  of  x is 

(18)  ||  x ||2  = [(x,e.)2. 

i=!  ~ -1 

Since  x is  an  arbitrary  vector  in  the  space,  and  e lies  in  the  space  spanned 
by  (x.},  the  x in  Equation  (17)  can  be  replaced  by  0.  So,  Equations  (17) 
and  (18)  will  be  rewritten  as  follows: 

(19)  8=1  (e.e.)e. 

~ i=l  ~ -1 

and  the  squared  norm  of  0 is  given  by  (20), 

(20)  ||e||2  = I (9,e,)2 

i = l ~ -1 
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I 


Note  that  Equation  (19)  is  an  exact  expression  of  0 itself  and  not  the 

A 

estimated  e.  From  Equations  (20)  and  (7),  the  standard  error  of  predic- 
tion will  be  zero  in  this  case.  As  long  as  a unidimensional  complete  latent 
space  for  n items  exists,  then  6 should  be  obtained  by  Equation  (19). 

Calculation  Formula:  The  estimation  of  0 was  given  in  Equation  (6),  but 
note  that  the  Fourier  coefficients  (©.e^ ),  i = 1,.  . .,n  include  unknown 
variable  0.  Therefore  they  must  be  approximated  by  measurable  variable. 

The  goal  can  be  achieved  first  by  rewriting  e^  as  a linear  combination  of 
x^,  then  we  obtain  Equation  (21). 

(z,)  ei = j,atkxk 

It  should  be  noted  that  the  coefficients  aik  are  determined  recursively  and 
given  as  follows, 

<Z2>  “u = 

Substituting  ei  in  Equation  (6)  by  Equation  (21),  0 is  expressed  as  a linear 
combination  of  x^. 

(23)  0 • Z ( I (e.eja.-x.  ) 

~ i=l  k=l  ~ -1  1K~K 

k=l  j=k~  ~J  ~K 

where  the  last  member  in  the  above  equations  is  obtained  by  taking  the  sum- 
mation of  terms  column-wise,  while  the  terms  in  the  second  member  are  summed 
row-wise. 
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\ 

I 
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Next,  let  us  show  that  the  Fourier  coefficient  (0,e.)  can  be  ap- 

•W  *V  J 

proximated  by  a function  of  the  item  discriminating  index  a-  which  were 
calibrated  earlier  in  a large  sample  by  the  MXL  estimation  method.  Without 
loss  of  generality,  we  can  assume  that  the  latent  trait  0 follows  the  nor- 
mal distribution  of  the  mean  0 and  the  standard  deviation  of  unity.  Sub- 
stituting ei  in  (0,e^)  by  Equation  (21),  and  taking  the  summation  over 
k and  the  coefficients  aik  out  of  the  parentheses  of  the  inner  product,  the 
middle  term  of  Equation  (24)  below  is  obtained.  Since  (0,xk)  is  written  by 
the  product  of  the  correlation  between  x^  and  0,  p0k,  and  the  standard  de- 
viation of  x^afx^)  with  the  multiplier  N-l,  thus  we  obtain  the  right  most 
term  of  Equation  (24)  by  substituting  the  relation  for  (0,x^). 


(24) 


(0.e.) 


i 


l “ik<e‘xk^ 
k=l  ~K 


1J,<W’(5k)(N-')- 


Replacing  the  Fourier  ^efficients  in  Equation  (23)  by  the  right  most  mem- 
ber of  Equation  (24),  the  final  form  for  actual  calculation  is  derived  as 
follows: 


(25)  0 = (N-l)  l [ l ( ^ pfl  ot(x.  )x.a..a 

k=l  j=k  1*1  0xi  -1  -1  J1 


jk 


Since  the  correlation  of  0 and  x..  is  approximated  by  (Lord  and  Novick,  1968) 


p(6,xi ) = 


/l  + a? 


the  coefficients  of  xk  in  Equation  (25)  are  expressed  by 


Bk  = (N-l)  l l — 1--.  gtxJtti.ttiL 

J=k  1=1 
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Hence,  the  gs  for  earlier  terms  in  the  multiple  regression  equation  remain 
unaltered  when  subsequent  terms  are  added  in  a stepwise  manner. 


Note  that  x.s  are  deviation  score  vectors,  hence  Equation  (25)  is 

J 

written  simply  by  the  linear  combination  of  xk  with  coefficients  3k»  which 
is  an  ordinary  multiple  regression  equation  with  standardized  coefficients. 

**  n 

(26)  9 = J,  ~ I x j , . • . »x^ ) 

If  raw  scores  are  given  first,  then  the  mean  of  each  item  j will  be  approxi- 
mated by  Equation  (27), 

(27)  y(x)  = (1/^Zi)  j exp(-t2/2)dt  = M-b^/A  + a2 

Yo 

/V 

Finally,  the  squared  multiple  R of  0 and  0 is  given  by  Equation  (28) 
as  follows: 

(28)  Ree  * I<j>  Hi"  7==>2 

1 K 1 / 1 + a£ 

APPLICATION  TO  ADAPTIVE  TESTING 

The  estimation  formula  of  9,  Equation  (26)  is,  indeed,  the  linear 
multiple  regression  function  of  the  criterion  variable  9 regressed  on  the 
predictors  x^s  with  standardized  3 weights.  The  coefficients  3^  should  be 
calculated  together  when  the  a^s  and  b^s  of  the  latent  trait  theory  models 
are  calculated.  Then,  it  is  easy  to  estimate  an  individual  latent  trait 
value  9..  by  using  Equation  (26)  when  an  adaptive  test  is  administered.  The 
situation  is  similar  to  that  in  a cross-validation  study  of  multiple  regres- 
sion analysis. 
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In  the  adaptive  testing  situation,  an  item  is  chosen  so  as  to 
maximize  the  information  function  of  a set  of  items.  In  this  case,  sub- 
sequent items  will  be  picked  up  so  as  to  maximize  the  increment  of  R0g 
given  in  Equation  (28).  We  should  recall  that  stepwise  multiple  regres- 
sion analysis  usually  does  not  pick  up  the  best  set  of  items  that  will 
yield  the  maximum  value  of  multiple  R among  all  possible  subsets.  Also, 
multiple  R is  a positively  biased  statistic,  therefore  the  resulting  R value 
is  inflated.  Moreover,  as  we  stated  in  the  earlier  part  of  this  paper,  a 
multiple  R becomes  larger  than  it  should  be  when  each  component  of  R0g  is 
replaced  by  R0g  . Since  it  is  well  known  that  the  problem  of  overestimated 
information  function  exists  in  adaptive  testing,  it  is  interesting  to  note 
that  the  same  kind  of  problem  persists  in  this  new  method.  However,  our 
approach  of  estimating  0 by  the  least-squares  method  does  avoid  the  problem 
of  endless  iterations  that  the  MXL  method  encounters  for  certain  response 

A 

patterns;  therefore  the  estimated  0 values  will  always  be  obtainable  in  the 
proposed  method. 

The  following  example  shows  how  the  new  estimation  method  works 
with  live  data  that  was  collected  on  the  PLATO  system  in  the  past  few  years. 

Example:  A 48-item  matrix  algebra  test  was  administered  to  the  class  of  an 
intermediate  statistics  course  at  the  University  of  Illinois  during  1976. 

The  data  was  gathered  and  calibrated  by  the  MXL  method.  Since  the  test  was 

designed  to  eliminate  guessing  as  much  as  possible,  as  evidenced  by  its 

1 * 

having  an  a21  of  .947,  the  two  parameter  logistic  model  was  used.  Four  out 
of  the  48  items  were  omitted  in  the  calibration  procedure  but  none  of  83 
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examinees  was  omitted.  The  values  of  6 estimated  by  the  two  methods  with 
various  combinations  of  items  are  calculated  and  the  results  are  summarized 
in  Table  1 . 

Table  1 

Comparison  of  Maximum  Likelihood  (MXL)  and  Least-Squares  (LS) 
Estimation  Methods  in  the  Matrix  Algebra  Pretest  Items 


Mul tiple 
* 

# of  cases  in  which 
iterations  for  0 do 
not  converge 

** 

Correlation  be- 
tween the  6 s 
obtained  by  LS 

Standard 
Deviation 
of  8s 

Items 

R 

LS 

MXL 

0 

and  MXL 

by  MXL 

35-41 

.970 

0 

24 

.060 

.974 

.939 

41-44, 

47,48 

.764 

0 

36 

.102 

.947 

.855 

12-16 

.721 

0 

20 

.153 

.968 

.889 

31-35 

.773 

0 

27 

.064 

.957 

.814 

12-21 

.891 

0 

12 

.069 

.952 

.848 

18-21, 

23-25, 

29-31 

.930 

0 

10 

.085 

.933 

.817 

22-31 

.807 

0 

10 

.146 

.919 

.904 

*Multiple  R is  also  the  standard  deviation  of  LS  estimates  since  8 is 
rescaled  so  as  to  have  a normal  distribution  with  mean  0 and  standard  de- 
viation one. 


**The  summation  of  the  squared  difference  of  estimated  8 by  the  LS 
method  from  the  MXL  method  over  the  number  of  subjects: 

D = £(LS-MXL)2/N 


The  multiple  Rs  of  the  linear  regression  equations  of  6 onto  7 
different  sets  of  items  are  given  in  the  second  column  of  Table  1.  The 
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number  of  cases  (i.e.,  response  patterns)  for  which  iterations  for  0 did 


not  converge  are  shown  in  columns  3 and  4 for  the  LS  and  MXL  methods,  re- 
spectively. The  estimated  0 values  are  obtained  for  all  examinees  by  the 
LS  method,  while  12  to  43  percent  of  the  examinees'  response  patterns  led 


to  nonconvergence  by  the  MXL  method. 

A 

Table  2 shows  the  individual  0 values  when  items  12  through  21 
were  chosen  as  the  predictors  of  0.  If  a response  pattern  did  not  lead  to 
convergence,  then  the  program  written  by  J.  B.  Sympson  sets  the  value  of 

A 

0 to  either  +5  or  -5. 


In  order  to  measure  the  closeness  of  the  0 values  estimated  by 
the  two  methods,  the  squared  differences  are  summed  and  divided  by  N.  These 

/v 

values  are  shown  in  column  5 of  Table  1.  The  correlation  of  the  two  0 values 
were  also  calculated  and  are  shown  in  column  6.  Figure  1 displays  the  rela- 


tion of  0 estimated  by  the  MXL  and  LS  methods. 

Since  our  model  assumed  0 to  be  normally  distributed  with  zero 
mean  and  unit  standard  deviation,  the  original  item  discriminating  powers 
(a's)  and  difficulties  (b's)  of  the  two-parameter  logistic  model  were  ad- 
justed so  that  the  0 there  follows  the  same  distribution.  Table  3 shows  the 
list  of  adjusted  a's  and  b's  obtained  from  the  two-parameter  logistic  model, 


! 

I 


and  B weights,  for  items  12  through  21. 

SUMMARY 

A latent  trait  variable  was  estimated  by  the  least-squares  method, 
in  other  words  by  regressing  0 onto  a set  of  observed  scores  on  given  test 
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Table  2 


Values  of  0 Estimated  by  MXL  and  LS 
using  Items  12  through  21 


# 

LS 

MXL 

# 

LS 

MXL 

# 

LS 

MXL 

1 

1.2494 

1 . 4402 

29 

0.0242 

-0.1744 

57 

-1.2829 

-1.5943 

2 

0.4881 

0.5006 

30 

0.4872 

0.9962 

58 

0.2303 

0.1992 

3 

0.7336 

0.5912 

31 

0.8262 

0.6034 

59 

0.9119 

1.1607 

4 

-1.6075 

-5.0000 

32 

1.2365 

1.8852 

60 

0.0953 

0.0601 

5 

-0.3268 

-0.1633 

33 

0.9992 

1.1999 

61 

0.2291 

-0.0370 

6 

0.8280 

0.8827 

34 

-1.6075 

-5.0000 

62 

0.9119 

1.1607 

7 

-1.0224 

-0.8850 

35 

0.0608 

-0.0200 

63 

-0.2867 

-0.4722 

8 

-1.0224 

-0.8850 

36 

0.2845 

0.2772 

64 

0.4442 

0.0059 

9 

-0.30C3 

-0.3630 

37 

1.2229 

5.0000 

65 

0.0737 

-0.1128 

10 

-1.6075 

-5.0000 

38 

0.4  062 

0.3835 

66 

-0.6160 

-0.5620 

11 

-0.8853 

-1.0096 

39 

0.8545 

0.5872 

67 

-0.1050 

-0.4666 

12 

0.2845 

0.2772 

40 

1.01 75 

0.9856 

68 

1.0440 

0.6641 

13 

-0.7244 

-0.4572 

41 

1.2229 

5.0000 

69 

1.2365 

1.8852 

14 

-1.6075 

-5.0000 

42 

-0.1858 

-0.5673 

70 

-1 .0813 

-0.6937 

15 

0.0378 

-0.2827 

43 

0.4216 

0.4777 

71 

-0.7567 

-0.4943 

16 

-0.9215 

-1.0077 

44 

0.6219 

0.3874 

72 

0.8144 

1 . 0902 

17 

1.2365 

1.8852 

45 

-0.9821 

-0.9083 

73 

-0.3282 

-0.5923 

18 

-0.4654 

-0. 7830 

46 

-0.5066 

-0.4120 

74 

-1.3838 

-1.3554 

19 

-1.6075 

-5.0000 

47 

0.4041 

0.3360 

75 

-0.3191 

-0.5094 

20 

-0.0023 

0.0281 

48 

0.0378 

-0.2827 

76 

-0.7997 

-0.5020 

21 

-0.4860 

-0.1790 

49 

-0.9480 

-0.7564 

77 

1.2229 

5.0000 

22 

0.7071 

0.8879 

50 

0.4062 

0.3835 

78 

-0.5737 

-0.6684 

23 

-0.2600 

-0.3813 

51 

0.4481 

0.2478 

79 

1 .2365 

1.8852 

24 

0.9119 

1.1607 

52 

-0.0954 

-0.2269 

80 

0.1150 

-0.1950 

25 

-0.3559 

-0.6202 

53 

1.2229 

5.0000 

81 

-1.2147 

-1.1420 

26 

1.2229 

5.0000 

54 

-1.2829 

-1.5943 

82 

0.6388 

0.5829 

27 

-1.6075 

-5.0000 

55 

1.0175 

0.9856 

83 

1.2229 

5.0000 

28 

-1.2726 

-0.9921 

56 

-0.9119 

-0.7588 

19 


Table  3 


B Weights  of  the  Least-Squares  Estimation  Method 


I terns 

* 

a 

★ 

b 

6 

12 

.269 

1.049 

-.014 

13 

.497 

-.025 

-.027 

14 

.469 

-.213 

.325 

15 

.699 

-.483 

.224 

16 

.733 

-.864 

.361 

17 

.815 

.485 

.409 

18 

.929 

-.619 

.736 

19 

.674 

.485 

.219 

20 

.712 

.014 

.406 

21 

.608 

.137 

.191 

*a's  and  b's  from  the  two-parameter  logistic  model  were  adjusted  so 
that  0 is  normally  distributed  with  zero  mean  and  unit  standard  deviation. 


items.  It  was  demonstrated  that  the  standardized  coefficient  for  an  item 
entered  at  any  stage  in  the  linear  regression  equation  for  0 can  be  calculated 
from  a sequence  of  recursively  defined  intervening  variable  a^;  the  earlier 
Bs  remain  unchanged  in  the  process.  The  Bi  are  also  functions  of  item  dis- 
criminating power  a.j  in  the  two-parameter  latent  trait  theory  model  that 
have  been  calibrated  previously.  The  constant  term  of  the  linear  multiple 
regression  equation  is  a function  of  a's  and  b's. 

If  B weights  are  calculated  at  the  same  time  as  a's  and  b's  in  the 
two-parameter  latent  trait  theory  model  are  calibrated,  then  the  least- 
squares  estimation  of  0 will  be  easily  and  always  obtained  for  any  different 


set  of  items  which  is  given  to  a student,  as  in  an  adaptive  testing  situation. 


The  advantage  of  using  the  least-squares  method  as  against  the 

A 

maximum  likelihood  method  is  that  values  of  6 are  always  obtainable  even 
for  a small  number  of  examinees  and  a small  number  of  items,  while  the 

A 

latter  method  cannot  provide  convergent  6 values  too  often  in  such  cases. 

The  accuracies  of  the  estimations  were  compared  and  it  was  con- 
cluded that  the  least-squares  method  is  just  as  good  as  the  maximum  likeli- 
hood method.  But  it  should  be  noted  that  since  the  comparison  was  made 

A 

between  estimated  0s  from  the  LS  and  MXL  methods,  there  is  no  way  of  know- 
ing the  exact  degree  of  accuracy  of  the  new  method  in  estimating  the  true 
value  of  a latent  trait  variable  0.  A Monte  Carlo  study  would  provide  an 
answer  to  this  question. 
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